Partially Distribution-Free Learning of Regular Languages from Positive Samples

نویسندگان

  • Alexander Clark
  • Franck Thollard
چکیده

Regular languages are widely used in NLP today in spite of their shortcomings. Efficient algorithms that can reliably learn these languages, and which must in realistic applications only use positive samples, are necessary. These languages are not learnable under traditional distribution free criteria. We claim that an appropriate learning framework is PAC learning where the distributions are constrained to be generated by a class of stochastic automata with support equal to the target concept. We discuss how this is related to other learning paradigms. We then present a simple learning algorithm for regular languages, and a self-contained proof that it learns according to this partially distribution free criterion.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Grammatical Inference and the Argument from the Poverty of the Stimulus

Formal results in grammatical inference clearly have some relevance to first language acquisition. Initial formalisations of the problem (Gold 1967) are however inapplicable to this particular situation. In this paper we construct an appropriate formalisation of the problem using a modern vocabulary drawn from statistical learning theory and grammatical inference and looking in detail at the re...

متن کامل

Grammatical Inference And First Language Acquisition

One argument for parametric models of language has been learnability in the context of first language acquisition. The claim is made that “logical” arguments from learnability theory require non-trivial constraints on the class of languages. Initial formalisations of the problem (Gold, 1967) are however inapplicable to this particular situation. In this paper we construct an appropriate formali...

متن کامل

PAC-Learning Unambiguous NTS Languages

Non-terminally separated (NTS) languages are a subclass of deterministic context free languages where there is a stable relationship between the substrings of the language and the non-terminals of the grammar. We show that when the distribution of samples is generated by a PCFG, based on the same grammar as the target language, the class of unambiguous NTS languages is PAC-learnable from positi...

متن کامل

Using Contextual Representations to Efficiently Learn Context-Free Languages

We present a polynomial update time algorithm for the inductive inference of a large class of context-free languages using the paradigm of positive data and a membership oracle. We achieve this result by moving to a novel representation, called Contextual Binary Feature Grammars (CBFGs), which are capable of representing richly structured context-free languages as well as some context sensitive...

متن کامل

Characterizing PAC-Learnability of Semilinear Sets

The learnability of the class of letter-counts of regular languages (semilinear sets) and other related classes of subsets of N d with respect to the distribution-free learning model of Valiant (PAC-learning model) is characterized. Using the notion of reducibility among learning problems due to Pitt and Warmuth called \prediction preserving reducibility," and a special case thereof, a number o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004